Scalable Approximate Bayesian Inference for Outlier Detection under Informative Sampling
نویسنده
چکیده
Government surveys of business establishments receive a large volume of submissions where a small subset contain errors. Analysts need a fast-computing algorithm to flag this subset due to a short time window between collection and reporting. We offer a computationallyscalable optimization method based on non-parametric mixtures of hierarchical Dirichlet processes that allows discovery of multiple industry-indexed local partitions linked to a set of global cluster centers. Outliers are nominated as those clusters containing few observations. We extend an existing approach with a new “merge” step that reduces sensitivity to hyperparameter settings. Survey data are typically acquired under an informative sampling design where the probability of inclusion depends on the surveyed response such that the distribution for the observed sample is different from the population. We extend the derivation of a penalized objective function to use a pseudo-posterior that incorporates sampling weights that “undo” the informative design. We provide a simulation study to demonstrate that our approach produces unbiased estimation for the outlying cluster under informative sampling. The method is applied for outlier nomination for the Current Employment Statistics survey conducted by the Bureau of Labor Statistics. c ©2016 Terrance D. Savitsky.
منابع مشابه
A Bayesian Approach for Detecting Outliers in ARMA Time Series
The presence of outliers in time series can seriously affect the model specification and parameter estimation. To avoid these adverse effects, it is essential to detect these outliers and remove them from time series. By the Bayesian statistical theory, this article proposes a method for simultaneously detecting the additive outlier (AO) and innovative outlier (IO) in an autoregressive moving-a...
متن کاملAn Approximate Bayesian Long Short-Term Memory Algorithm for Outlier Detection
Long Short-Term Memory networks trained with gradient descent and back-propagation have received great success in various applications. However, point estimation of the weights of the networks is prone to over-fitting problems and lacks important uncertainty information associated with the estimation. However, exact Bayesian neural network methods are intractable and non-applicable for real-wor...
متن کاملPosterior Predictive Outlier Detection Using Sample Reweighting
In a Bayesian model, we de ne an outlier as an observation which is \surprising" relative to its predictive distribution, under the model, given the remainder of the data. Hence \outlyingness" can be measured by the posterior predictive p-value of any interesting scalar summary of the (possibly multivariate) observation. For this calculation, we exclude the case of interest from the data, analo...
متن کاملEfficient variational Bayesian neural network ensembles for outlier detection
In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.
متن کاملFficient Variational B Ayesian Neural Net - Work Ensembles for Outlier Detection
In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016